Abstract
Accurate prediction of decision-making by the monitoring of neural
activity by various stimuli is a very important concept. Steinbetz et.
al. had aimed to solve such a challenge and had found that differences
in visual contrasts resulted in giving more significant neural activity
and decision making than when contrasts were roughly equal. What had
been done in this report along with a general aggregating by neurons and
time as well as by clustering to see how different types of neurons
would behave differently. What we had come to see was that there were
differences in neural activity as well as decision making, though the
prediction model was not at a very desirable level. However, there was
enough to inspire further research into the different types of
neurons.
Introduction
The primary objective for us is to get a better insight into the
neural activity of a mouse’s visual cortex and how it is used in the
prediction of the outcomes of the trial. More specifically there are two
questions of interest. 1.The primary question of interest is how do the
neurons in the visual cortex respond to the stimuli presented on the
left and right? More specifically, we would like to know if the
combination of left and right stimuli have additive effects on the
neural responses or if there is some interaction of the two. 2.The
secondary question: Is it possible to predict the feedback of each trial
depending on the stimuli presented? The main significant idea to draw
from this project is the understanding of neural activity and predicting
actions based of how certain variables effect neural activity.
Furthermore, as mammals do not vary that much from each other, this can
provide insight into how human beings’ neural patterns are influenced
and gain a method of prediction of human feedback. I would also like to
add that the understanding and prediction of human beings is very
powerful and though this project may not have serious consequences, it
may lead to more complex projects that could. This must be used
responsibly and morally, as concern of such information can be used for
the wrong reasons.
Background
We study the data given by the study titled Distributed
coding of choice, action and engagement across the mouse brain by
Nicholas Steinmetz et. al. This article describes a study in which
researchers investigated how different parts of the mouse brain encode
information related to decision making, action and engagement.
In this article, Steinmetz et. al. recorded neural activity across
the brain while mice performed a task. On each trial, visual stimuli of
varying contrast could appear on the left, right both or neither sides
of the mouse. Mice earned a water reward by turning a wheel with their
paws to indicate which side had highest contrast. If neither stimulus
was present, they earned a reward for making a third type of response,
keeping the wheel still for 1.5 seconds,however this response will not
be included in our trials as the trial length is only 0.4 seconds at
most, though it is important for us to mention as it pertains to some of
the data recorded. The feedback we will be observing will be a binary
feedback of either left or right wheel turning. We also note that all
mice were trained prior to the data recordings so they could get used to
the method.
Some findings we will hope to confirm and explore further are that
the choices were most accurate when stimuli appeared on a single side at
high contrast. In low contrast the mice performed worse. I will not get
into the biological terminology too much but there were observed
differences in the reaction of different parts of the brain to the
stimulus however there was also neuron firings following the wheel turn
which can lead to an issue in noise. Also to note, when the mouse
successfully selected a visual stimulus contralateral to the recorded
hemisphere of the brain, the visual cortex neurons fired first. We will
be looking into some of these findings as well as exploring others as we
go through the analysis.
What the article showed was that different groups of neurons were
active in different parts of the brain depending on the type of task
being performed. Also, the activity of neurons was highly correlated
with the behavior of the mice, indicating that patterns of neural
activity were related to their feedback.
Data Overview
In total there were 39 sessions taken with 10 mice and recordings
were taken from 30,000 neurons in 42 brain regions with 2-3 probes at a
time to get recordings from different regions of the brain. In our
subset of data we get 5 RDS files that contain 5 sessions taken from two
different mice, ‘Cori’, and ‘Forssmann’. Each session contains the
following:
Five variables are available for each trial, namely -
mouse_name: Name of the mouse - date_exp: Date
of the experiment - feedback_type: type of the feedback, 1
for success and -1 for failure for each trial in the session. The
success is if the mouse spins the wheel in the correct direction in
regards to the stimulus. Contrasts taken at levels 0,0.25,0.5 and 1 -
contrast_left: contrast of the left stimulus that was used
for each trial in the session - contrast_right: contrast of
the right stimulus that was used for each trial in the session -
time: centers of the time bins for spks.
Generally a 0.4 second time interval partitioned into 39 time segments
of each trial in the session - spks: numbers of spikes of
neurons in the visual cortex in time bins defined in time.
In the form of a matrix of number of neurons by time partition (# of
neurons x 39) of each trial in the session in which activity of neurons
are recorded and made available in the form of spike trains.
Analysis Version 1:
Aggregating by neurons, and time.
Descriptive
Analysis
Table for number of
trials and neurons per session
This table summarizes our data by session
Neuron and Trial distribution by Session
| Number of Trials |
178 |
533 |
228 |
120 |
99 |
| Number of Neurons |
214 |
251 |
228 |
249 |
254 |
When looking at the sample size of trials for each session, the
concern comes up of whether we have an unbalanced or balanced data set.
According to this
machine learning page we have a mildly imbalanced data set. Since
this is a mild case, we will not be downsampling or upweighting the data
however. We do notice much more unbalanced data in terms of neuron count
per session and will be adjusting our methods accordingly.
In terms of the count of neuron imbalance, we will deal with this
issue further down in our analysis.It is safe to state that we are
unbalanced in size when it comes to neuron count per session.
DataFrame (Collapse
on neuron and time variables)
We first look to make a general table where we collapse on neurons
and time to get a general sense of what we are looking at and to see if
there any inferences we can already make. We also want to have a sort of
control to test against other types of dataframes we will be making. For
example, we will be clustering later on and would like to see
differences between overall neural activity and cluster activity.
Data
Assumptions
The problem I would like to emphasize beforehand is that we will be
making the following assumption: All neurons are treated equal as we are
averaging by neuron firings. This is not correct as we know there are
different types of neurons which are suspected to behave differently.
Again, this is just a base dataframe off of which we will refine in the
second analysis.
All time points are treated equal as we are averaging by time
interval.
Contrast levels are treated as factors instead of a continuous
variables. We are doing this because whether a contrast is 0.25 or 0.26
will probably not make much of a difference. However 0.25 vs 1.0 will
most likely show an impact. Thus by keeping our data as factors we can
still get the inferences we need however we can also apply functions
that are geared more towards categorical data.
The issues we avoid in aggregating our neurons is that we now do not
need to worry about the different number of neurons per session.
Contrast
dispersion table
Dispersion Table by contrast Left(Rows), and
Right(Column)
| L0.0 |
327 |
50 |
84 |
130 |
| L0.25 |
33 |
30 |
40 |
86 |
| L0.5 |
83 |
40 |
32 |
37 |
| L1.0 |
79 |
75 |
36 |
34 |
This is the number of trials for each combination of left and right
contrast. We notice that there are a lot of 0,0 combinations. This is
probably because there is a different experiment being done on testing a
0,0 reaction of the mice for 1.5 seconds which is beyond the scope of
this course. We will see this in histograms later on and consider
removing some of the neurons that do not fire at all in our trials.
**Note: This dispersion of left and right contrast are similar between
session thus we will not waste space on showing each table.
Boxplot of overall
sessions
Observations:Boxplot: We see in the boxplot that the average neuron
firing rate of sessions 1, 2 and 3 of Cori is significantly higher than
that of Forssmann in sessions 4 and 5. This indicates that averaging all
of our neurons may not be the best idea as there is clear differences.
More importantly the differences are between the mice indicating a
biological difference in types of neurons.
Observations: Right Contrast: We notice an overall increase in firing
rate of neurons based on the increase in right contrast. This may
indicate high influence of the right contrast in our firing rates and
predictions. Left Contrast: We also notice a somewhat less obvious
though still an upward trend of firing rate based on the left contrast.
Interaction Plot: The only thing we can notice in this plot is that when
contrast_left = 0, there is a strictly positive trend between the right
contrast and the firing rate. What can also be seen is that the bigger
the difference in contrast the higher the firing rate seems to be. In
further study we may see this more pronounced.
Inferential
Analysis
We now look to see if the interaction of the contrasts is significant
or not with a hypothesis test: ### Model building Originally what we
were starting out with was a mixed model effect. \[Y_{ijk} = \mu_{...} + \alpha_{i..} + \beta_{.j.}
+(\alpha\beta)_{ij} + \eta_{..k}+\epsilon_{ijkl}\]
Where \(\mu\) is the population mean
\(\alpha_i\) is the i-th fixed effect
of left contrast \(\beta_j\) is the
j-th fixed effect of right contrast \((\alpha\beta)_{ij}\) is the interaction
term of the contrasts \(\epsilon_{ijkl}\) is the split-plot error
which is the error term acting on the level of an individual
observation. \(\eta_k\) is the random
effect variable of the k-th session
Model
Assumptions
\(\eta_{i_k}\sim(0,\sigma^2_\eta)\)
and \(\epsilon_{ijk}\sim(0,\sigma^2)\)
and independence between \(\eta_{i_k}\)
and \(\epsilon_{ijk}\)
\(H_0: (\alpha\beta)_{i,j}=0\) for
all combinations of i,j \(H_A: \exists _i,_j,
s.t. (\alpha\beta)_{i,j}\neq0\) Significance level: \(\alpha=0.05\)
Result: We run tests on whether the interaction term is significant
on both a mixed effects and a fixed effects model. We get all of our p
values to be less than \(\alpha\) =
0.05 therefore a random effect as well as the interaction term of
contrasts is significant at this given significance level. However, we
can note that if we had chosen a smaller \(\alpha\) the interaction term in the mixed
effects which was 0.044 would not be considered significant. This is the
alpha level we were comfortable with but to highlight, if you choose a
lower significance level you will need to use the mixed effect additive
model.
In conclusion we will be keeping our full model the way it is above
as our final model.
Sensetivity
Analysis
qqnorm(resid(fit))
qqline(resid(fit))
QQplot: We see the residuals falling along the qqline with a very slight
heavy right tail. From this we can assume normality is a reasonable
assumption.
Reasoning for
Clustering
The big take away here is. Our boxplots give us a big
indication as to why we choose to do a clustering analysis. We ahd
originally averaged out our firing rates by neuron, trial and time
segment. In doing so we are making an assumption that all neurons fire
the same within all trials and time segments. This is however obviously
not the case as our boxplots reveal. So what we will now do is a
different version where we cluster our data by neuron to see if we can
find different characteristics because maybe there is some type of
difference that we can see between the two mice. One hypothesis I am
thinking of is that a certain type of neuron/neurons is more present in
one mouse than the other
Analysis Version 2:
Clustering by neuron-type
Inferential
analysis
###Removal of random term We were originally fitting a mixed effects
model however as we discussed before, the randomness did not make sense
in this case. We only have 2 mice that we are working with therefore
using an accurate random intercept is not something we are comfortable
with. We also assume that sessions coming from the same mouse are not
random because of the fact that we assume neurons belonging to the same
mouse’s brain will not differ that much from eachother.
Therefore here we will be working with a fixed effects model.
Model building
\[Y_{ijk} = \mu_{...} + \eta_k+
\alpha_{i..} + \beta_{.j.} + (\alpha\beta)_{ij}
+\epsilon_{ijk}\]
We now look to see if the interaction of the contrasts is significant
or not with a hypothesis test: Where \(\mu\) is the population mean \(\alpha_i\) is the i-th fixed effect of left
contrast \(\beta_j\) is the j-th fixed
effect of right contrast \((\alpha\beta)_{ij}\) is the interaction
term of the contrasts \(\epsilon_{ijkl}\) is the split-plot error
which is the error term acting on the level of an individual
observation. \(\eta_k\) is the random
effect variable of the k-th session
Assumption \(\epsilon_{ijk}\sim(0,\sigma^2)\)
\(H_0: (\alpha\beta)_{i,j}=0\) for
all combinations of i,j \(H_A: \exists _i,_j,
s.t. (\alpha\beta)_{i,j}\neq0\) Significance level: \(\alpha=0.05\)
fitr1=lmer(avg_firing_rate~(1|session_col)+contrast_left*contrast_right, data = df_cluster1)
fitr2=lmer(avg_firing_rate~(1|session_col)+contrast_left*contrast_right, data = df_cluster2)
fitr3=lmer(avg_firing_rate~(1|session_col)+contrast_left*contrast_right, data = df_cluster3)
fitr4=lmer(avg_firing_rate~(1|session_col)+contrast_left*contrast_right, data = df_cluster4)
fit1=lm(avg_firing_rate~contrast_left*contrast_right, data = df_cluster1)
fit2=lm(avg_firing_rate~contrast_left*contrast_right, data = df_cluster2)
fit3=lm(avg_firing_rate~contrast_left*contrast_right, data = df_cluster3)
fit4=lm(avg_firing_rate~contrast_left*contrast_right, data = df_cluster4)
afr1f1=anova(fitr1, fit1)$`Pr(>Chisq)`
afr1f1[2]
## [1] 0
We attempt to do away with our random effect term however upon our
test when fitting a mixed effects model with a fixed effects model we
notice that our p value is which gives us a strong indication to reject
the null hypothesis for each cluster and keep our random effects term.
We had originally hoped to do away with the randomness term and create a
fixed effect for simplicity however that will result in a less accurate
model.
a_fr1=anova(fitr1)
a_fr2=anova(fitr2)
a_fr3=anova(fitr3)
a_fr4=anova(fitr4)
When testing for the significance of our interection term in each
cluster we come to an interesting finding. 3 of our clusters find the
interaction term to be insignificant however, our second cluster finds
our interaction term to be significant at a p-value of 0.0454574.
Therefore for clusters 1, 3 and 4 we get an additive mixed effects model
whereas for cluster 2 it will be a mixed effects full mode.
Sensetivity
Analysis
par(mfrow=c(2,2))
qqnorm(resid(fitr1), main = "Cluster 1")
qqline(resid(fitr1))
qqnorm(resid(fitr2), main = "Cluster 2")
qqline(resid(fitr2))
qqnorm(resid(fitr3), main = "Cluster 3")
qqline(resid(fitr3))
qqnorm(resid(fitr4), main = "Cluster 4")
qqline(resid(fitr4))

Normality seems to hold for cluster 1 however for cluster 2, 3, and 4
we see a violation. However we would also like to take note of the fact
that we have a much larger number of clusters 1 and 2 therefore by
following CLT, it may be possible that is what accounts for the
differences we see above. Notice how the smaller the number per cluster,
the less normal the plot appears.
Predictive
modeling
We will aim to predict whether a mouse responds correctly or not
based on the combination of contrast stimuli as well as the mean firing
rate
We chose to have all of our terms be continuous instead of
categorical because otherwise we would have too many coefficients which
would render our model uninterpretable. \[Logit (p_i(y) = \beta_0+(\beta_1X_1+\beta_2X_2 +
\beta_3X_3 + \beta_4X_4)+\beta_5X_5 + \beta_6X_6 +
\beta_7X_5X_6\]

# Calculate the AUC
pp=round(performance(prediction, "auc")@y.values[[1]],4)
We achieve a performance of 0.6047 based on our ROC Curve which gives
us better odds than a 50/50 chance however it is still not that high of
a performance value.
cutpoints=data.frame(cut=perf@alpha.values[[1]], fpr=perf@x.values[[1]],
tpr=perf@y.values[[1]])
cutpoint=cutpoints[cutpoints$cut==cutpoints$cut[680],]
sensitivity=round(cutpoint[1,3],3)
specificity=round(1-cutpoint[1,2],3)
We’re using the decision making rule of taking the cutpoint to be
where we get a fairly high specificity and highest ratio between our FPR
and TPR. Given the ROC Curve as well as the FPR and TPR ratio’s I’d
choose the cutpoint of 0.6297392 with a sensetivity being 0.682 and
specificity being 0.494. Though this is not the highest values we would
like for our sensetivity, we have the same FPR as by chance but we do
get a sensitivity that is a decent value. This is the best we can do
with this prediction so it is acceptable here.
Discussion:
To summarize, our study has provided some interesting insights into
the neural activity and decision-making of mice. Some of these we had
expected based on the background information gathered. We first noticed
that mice differed in their average firing rates and that there was some
sensitivity to different contrast levels especially in the right
contrast. It maybe the mice have a more dominant response to contrast in
one side of the brain than the other. That may require a neorobiologist
to comment on that. Our clustering method had resulted in the findings
of 4 different clusters of neurons based on their firing rates. The
difference in cluster ratio’s per mouse were an interesting find. We saw
that mice differed in their most active neuron type and that Cori had a
much higher ratio of a specific neuron type 2 than Forssmann. Though
beyond the scope of this project as we do not have the data, the history
of each mouse may give some neurobiological insight for this difference.
Secondly, neuron firing seems to be sensetive to higher contrast rates
of either left or right or both contrasts, which makes sense as optical
nerve sensetivity would be more affected by a bigger distortion of
light. There was also confirmation on feedback percentage based on
contrast. It was observed that mice picked the right decisions when
there was a stark difference in contrasts where as they were
significantly lower when contrasts were of equal value. This would make
sense as it is easy to separate right and left if the light distorions
between the two are drastic. In all our analysis has confirmed what the
background information has provided. What may be of use to try in future
projects would be to analyze the feedback percentage of each mouse to
see if one is more correct than the other. Mouse history aside, a
difference may show a significance in cognitive ability based on neuron
type 2.
Acknowledgements
Metholodies of how to tackle data structures, grouping methods and
assumptions were discussed by Professor Chen, Matthew Chen(no relation),
and Jasper Tsai.
Reference
Steinmetz, N.A., Zatka-Haas, P., Carandini, M. et al. Distributed
coding of choice, action and engagement across the mouse brain. Nature
576, 266–273 (2019). https://doi.org/10.1038/s41586-019-1787-x
Allen WE;Kauvar IV;Chen MZ;Richman EB;Yang SJ;Chan K;Gradinaru
V;Deverman BE;Luo L;Deisseroth K; (2017, May 17). Global representations
of goal-directed behavior in distinct cell types of mouse neocortex.
Neuron. Retrieved February 18, 2023, from https://pubmed.ncbi.nlm.nih.gov/28521139/
Google. (n.d.). Imbalanced data | machine learning | google
developers. Google. Retrieved February 10, 2023, from https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data
Animal Nerve Cells. CliffsNotes. (n.d.). Retrieved March 4, 2023,
from https://www.cliffsnotes.com/study-guides/biology/biology/nervous-coordination/animal-nerve-cells#:~:text=There%20are%20three%20types%20of,%2C%20interneurons%2C%20and%20motor%20neurons.
Wikimedia Foundation. (2023, March 2). Visual cortex. Wikipedia.
Retrieved March 18, 2023, from https://en.wikipedia.org/wiki/Visual_cortex#:~:text=The%20primary%20visual%20cortex%20is%20divided%20into%20six%20functionally%20distinct,4B%2C%204C%CE%B1%2C%20and%204C%CE%B2.
All clusters exhibit similar reaction to contrast differences.